Synopsis: Implicit Columns

Let’s get introduced to another antipattern that focuses on listing columns while retrieving data from a database.

A PHP programmer once asked me for help troubleshooting the confusing result of a seemingly straightforward SQL query against their library database:

Retrieving Books table from the library database by joining this table to Authors

This query returned all book titles as NULL. Even stranger, when the PHP programmer ran a different query without joining the Authors, the result included the real book titles.

Retrieving Books table from the library database

I helped the programmer find the cause of their trouble: the PHP database extension they were using returned each row resulting from the SQL query as an associative array. For example, the programmer could access the Books.isbn column as $row["isbn"] using this extension. In their tables, both Books and Authors had a column called title (the latter was for titles like “Dr.” or “Rev.”). Now, we know that a single-result array element $row["title"] can store only one value. In this case, Authors.title constituted that array element. Most authors in the database had no title, so the result was that $row["title"]appeared to be NULL. When the query skipped the join to Authors, no conflict existed between the column names, and the book title constituted the array element as expected.

I told the programmer that the solution was to declare a column alias, i.e., to give one or the other title column a different name, so that each would have a separate entry in the array.

Retrieving title columns from Books and Authors tables using an alias to remove conflict

Their second question was, “How do I give one column an alias but also request other columns?” They wanted to continue using the wildcard (SELECT *) but apply an alias to one column covered by the wildcard.

Objective: Reduce typing#

Software developers, ironically, don’t seem to like to type.

One often hears programmers complaining, for example, about too much typing being required in writing all the columns used in an SQL query:

Retrieving data explicitly from the Bugs table

It’s no surprise that software developers gratefully use the SQL wildcard feature. The * symbol means every column, so the list of columns is implicit rather than explicit. This helps make queries more concise.

Retrieving data implicitly from the Bugs table

Likewise, when using INSERT, it seems smart to take advantage of the default setting, namely that the values apply to all the columns in the order they’re defined in the table.

Inserting a record in the Accounts table

Thus, it’s shorter to write the statement without listing the columns.

Inserting a record in the Accounts table without listing the columns in the query

Legitimate uses of the antipattern#

A well-justified use of wildcards is in ad hoc SQL when we’re writing quick queries to test a solution or as a diagnostic check of current data. A single-use query benefits less from maintainability.

The examples in this course use wildcards to save space and avoid distracting from the more interesting parts of the example queries. However, the authors of this course rarely use SQL wildcards in their personal projects of application code production.

If our application needs to run a query that adapts when columns are added, dropped, renamed, or repositioned, we may find it best to use wildcards. We need to be sure to plan for the extra work it takes to troubleshoot the pitfalls.

We can use wildcards for each table individually in a join query. To do this, we prefix the wildcard with the table name or alias. This allows us to shortlist the specific columns we need from one table while using the wildcard to fetch all the columns from the other table. For example:

Retrieving data from Bugs table and Accounts table

Keying in a long list of column names can be time-consuming. For some people, development efficiency is more important than runtime efficiency. Likewise, some programmers may place more priority on writing queries that are shorter and, therefore, more readable. Using wildcards reduces keystrokes and results in shorter queries, so it’s best to use wildcards if this is our priority.

We sometimes hear developers claim that a long SQL query passing from the application to the database server causes too much network overhead. In theory, query length could make a difference in some cases. But it’s more common that the rows of data that our query returns use much more network bandwidth than our SQL query string. We have to use our judgment about exceptional cases, but it’s best not to sweat the small stuff.

Solution: Divide and Conquer
Antipattern: A Shortcut That Gets You Lost
Mark as Completed
Report an Issue